Explorative data analysis

Author

Antoine Thomas

Published

August 3, 2023

Exploratory data analysis involves examining the data sets imported as part of the data collection process. The first step is to understand the datasets and their variables, gain basic insights, and perform tasks such as computing CAQI-Index values. Another step is to investigate missing observations and how to deal with them. Finally, the actual analysis begins with examining distributions by calculating summary statistics and creating visualizations. Temporal trends and anomalies as well as outliers can be identified. Correlations, especially linear relationships between pollutant concentrations and weather or traffic data, are then examined.

Loading data sets from data collection

# Import of data sets which have been exported at the end of data collection
air_weather_df <- read_csv2(file = "Daten/DataCollection/air_weather_df.csv") %>%
  select(-...1)
traffic_df <- read_csv2(file = "Daten/DataCollection/traffic_df.csv") %>%
  select(-...1)
traffic_detectors <- read_csv2(file = "Daten/DataCollection/traffic_detectors.csv") %>%
  select(-...1)
airquality_stations <- read_csv2(file = "Daten/DataCollection/airquality_stations.csv") %>%
  select(-...1)

# Extracting stations and groups from airquality_stations
airquality_station_groups <- airquality_stations %>%
  select(name, stationgroups) %>%
  distinct()

# At a first sight, it appears that a lot of air quality monitoring data is 
# missing before the beginning of 2017. It is therefore decided to exclude 
# this data from the analysis
air_weather_df <- air_weather_df %>%
  filter(date >= as.Date("2017-01-01"))

# Grouping air quality monitoring data by station
airweather_by_station <- tibble(Station = airquality_station_groups %>% pull(name)) %>%
  mutate(messwerte = map(Station, function(x) air_weather_df %>% filter(Station == x)))

According to the data available to us, there are 20 air quality monitoring stations in Berlin, evenly distributed throughout the city and its immediate surroundings. These can be divided into 3 categories: “Suburb”, “Background” and “Traffic”. “Suburb” stations are located in the suburbs of Berlin, some of them in forests. “Background” includes stations that are located in the city, but collect typical measurements for residential areas. “Traffic” stations are located in the immediate vicinity of a major road, and their readings are likely to be strongly influenced by traffic.

Overview of Datasets

The following data sets contain the essential Data which will be used in the context of the analysis in this work. Those are on one side, as already mentioned, pollutant concentration data, weather data as well as traffic data. All data sets contain records ranging from January 2017 to April/May 2023.

Pollutant Measurement and Weather Data

The first imported dataset contains weather and air pollution measurements recorded at various stations across Berlin. The recorded parameters include the date and time of the respective measurements, a station name, a station category as ‘traffic’, ‘suburb’ or ‘background’, two levels of particulate matter (PM2.5 and PM10), ozone (O3) and nitrogen dioxide (NO2) concentrations. Weather conditions are described by dew point temperature at 2 metres, amount of precipitation, relative humidity at 2 metres, surface air pressure, temperature at 2 metres, wind direction at 100 metres, wind speed at 100 metres and duration of sunshine.

Rows: 1,117,140
Columns: 15
$ date                <dttm> 2017-01-01 01:00:00, 2017-01-01 02:00:00, 2017-01…
$ Station             <chr> "010 Wedding", "010 Wedding", "010 Wedding", "010 …
$ stationgroups       <chr> "background", "background", "background", "backgro…
$ pm25                <dbl> 175, 99, 63, 29, 20, 20, 26, 27, 29, 23, 19, 17, 1…
$ pm10                <dbl> 185, 104, 67, 31, 22, 23, 28, 30, 32, 25, 21, 20, …
$ O3                  <dbl> 8, 19, 22, 32, 34, 30, 26, 26, 25, 34, 38, 39, 41,…
$ NO2                 <dbl> 48, 37, NA, NA, 21, 24, 24, 25, 26, 22, 22, 21, 21…
$ dewpoint_2m         <dbl> -1.3, -1.6, -1.9, -2.3, -2.7, -3.1, -3.5, -4.0, -2…
$ precipitation       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ relativehumidity_2m <dbl> 81, 79, 77, 76, 74, 72, 71, 69, 77, 75, 73, 69, 68…
$ surface_pressure    <dbl> 1019.2, 1018.7, 1017.9, 1017.4, 1016.6, 1015.7, 10…
$ temperature_2m      <dbl> 1.6, 1.7, 1.7, 1.6, 1.5, 1.4, 1.2, 1.1, 0.8, 0.6, …
$ winddirection_100m  <dbl> 235, 232, 233, 236, 236, 235, 236, 236, 231, 229, …
$ windspeed_100m      <dbl> 7.24, 7.62, 7.92, 7.85, 7.85, 7.96, 8.07, 8.13, 8.…
$ duration_sunlight   <dbl> 466, 466, 466, 466, 466, 466, 466, 466, 466, 466, …

Traffic Data

The imported data set with traffic data provides measurements from various stations across Berlin. Each entry includes the station’s identifier, the date and time of the measurement, and a quality indicator for the collected data.

The traffic data include the hourly count and average speed in kilometers per hour of vehicles, both overall and broken down by cars and trucks. It thus provides a detailed view of the city’s traffic flow, differentiating between cars and trucks.

Rows: 12,465,926
Columns: 9
$ cs_shortname <chr> "TE001", "TE001", "TE001", "TE001", "TE001", "TE001", "TE…
$ date         <dttm> 2017-01-01 00:00:00, 2017-01-01 01:00:00, 2017-01-01 02:…
$ quality      <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 1…
$ q_kfz_mq_hr  <dbl> 87, 319, 271, 180, 111, 77, 98, 108, 166, 317, 808, 1287,…
$ v_kfz_mq_hr  <dbl> 77, 74, 80, 83, 88, 86, 78, 73, 68, 72, 63, 65, 88, 78, 7…
$ q_pkw_mq_hr  <dbl> 86, 311, 268, 174, 108, 73, 93, 105, 147, 306, 781, 1269,…
$ v_pkw_mq_hr  <dbl> 78, 76, 81, 84, 89, 86, 78, 73, 68, 72, 62, 65, 89, 78, 7…
$ q_lkw_mq_hr  <dbl> 1, 8, 3, 6, 3, 4, 5, 3, 19, 11, 27, 18, 28, 22, 22, 17, 2…
$ v_lkw_mq_hr  <dbl> 0, 11, 3, 56, 30, 83, 75, 83, 74, 73, 77, 76, 52, 63, 83,…

Observation and handling of missing values

When looking for missing pollutant values, it is noticeable that some stations provide very reliable data, while others provide very unreliable or no data at all. “Traffic” type stations hardly record data for O3, although this shouldn’t be too much of a problem, as this pollutant is not relevant for the calculation of the CAQI index at “traffic” stations (see section 3.3). Some ” suburb ” stations are also characterised by missing data for PM2.5 and PM10. For some other stations up to almost 70% of the values are missing for the period January 2017 to May 2023. The decision to include stations in further analysis was based on the proportion of missing data points for each pollutant. If a station has more than 25% missing observations for more than two pollutants, it is excluded from our dataset. The following table shows the relative proportion of missing values and the decision to exclude or not to exclude a station.

# Function to compute the relative amount of missing values (NA) for each station and pollutant 
calc_na_amount <- function(st, df) {
  df %>%
    map(., ~((sum(is.na(.))/length(.))*100)) %>%
    as_tibble() %>%
    mutate(Station = st)
}

# Function to classify stations regarding the amount of missing data
classify_eliminate <- function(vec) {
  ifelse(sum(vec == T)>2, T, F)
}


# Computing the relative amount of NA's for eacht pollutant and Station.
# Providing information whether respective Station will be excluded or not. 
rel_na_amount_by_station <- airweather_by_station %>%
  mutate(na_amount = map2(Station, messwerte, calc_na_amount)) %>%
  pull(na_amount) %>%
  bind_rows() %>%
  select(Station,
         pm25,
         pm10,
         O3,
         NO2) %>%
  mutate(missing_pm25 = ifelse(.$pm25 > 25, T, F),
         missing_pm10 = ifelse(.$pm10 > 25, T, F),
         missing_O3 = ifelse(.$O3 > 25, T, F),
         missing_NO2 = ifelse(.$NO2 > 25, T, F)) %>%
  mutate(missing_agg = pmap(list(missing_pm25,missing_pm10,missing_O3,missing_NO2),c)) %>%
  mutate(eliminate = map(missing_agg, classify_eliminate),
         eliminate = as.logical(eliminate))


rel_na_amount_by_station %>%
  select(Station, pm25, pm10, O3, NO2, eliminate) %>%
  kable(caption = "Relative Amount of missing values for air quality stations and deicison whether to eliminate station or not.")


# Removing not needed functions
remove(calc_na_amount,
       classify_eliminate)


# Stations, providing nearly complete data
stations_values_complete <- rel_na_amount_by_station %>%
  filter(missing_pm25 == F,
         missing_pm10 == F,
         missing_O3 == F,
         missing_NO2 == F,
         eliminate == F) %>%
  pull(Station)

# Stations, which do not monitor O3 values
stations_O3_missing <- rel_na_amount_by_station %>%
  filter(missing_O3 == T,
         eliminate == F) %>%
  pull(Station)

# Stations, which do not monitor PM2.5/PM10 values
stations_PM_missing <- rel_na_amount_by_station %>%
  filter(missing_pm25 == T,
         missing_pm10 == T,
         eliminate == F) %>%
  pull(Station)


# Stations, their monitoring values as well as their missing values and type
airweather_by_used_stations <- tibble(Station = stations_values_complete, missing_values = NA) %>%
  rbind(tibble(Station = stations_O3_missing, missing_values = "O3")) %>%
  rbind(tibble(Station = stations_PM_missing, missing_values = "PM2.5/PM10")) %>%
  inner_join(airweather_by_station) %>%
  left_join(airquality_station_groups, by = c("Station" = "name"))

# Removing not needed data
remove(stations_values_complete,
       stations_O3_missing,
       stations_PM_missing)
Source: Berlin Open Data - dl-de/by-2-0 URI: luftdaten.berlin.de
Station pm25 pm10 O3 NO2 eliminate
010 Wedding 0.3437349 0.3437349 0.4887481 0.7053726 FALSE
018 Schöneberg 100.0000000 100.0000000 100.0000000 0.3491058 TRUE
027 Marienfelde 100.0000000 100.0000000 0.5048606 0.4976995 FALSE
032 Grunewald 2.5368351 2.5332546 1.1529441 0.9524321 FALSE
042 Neukölln 0.6337612 0.6337612 0.4457812 0.6606155 FALSE
077 Buch 0.8002578 0.7966772 0.7984675 1.0688007 FALSE
085 Friedrichshagen 0.8342732 0.8324829 0.8002578 0.7716132 FALSE
115 Hardenbergplatz 100.0000000 100.0000000 62.3413359 0.7716132 TRUE
117 Schildhornstraße 0.6516641 0.6534544 100.0000000 0.5675206 FALSE
124 Mariendorfer Damm 0.3974435 0.3992338 100.0000000 0.2846555 FALSE
143 Silbersteinstraße 0.2184149 0.2184149 100.0000000 0.4153463 FALSE
145 Frohnau 100.0000000 100.0000000 0.6212292 0.6391321 FALSE
171 Mitte 5.0915731 1.7168842 100.0000000 0.7393881 FALSE
174 Frankfurter Allee 0.8020481 0.7948869 31.5555794 0.6427126 FALSE
220 Karl-Marx-Straße 29.5039118 29.5021215 100.0000000 29.7420198 TRUE
282 Karlshorst 100.0000000 100.0000000 100.0000000 0.3777503 TRUE
088 Messwagen Leipziger Str. 52.7937412 53.4257121 53.5134361 53.5044847 TRUE
014 Sondermessstation 100.0000000 100.0000000 100.0000000 56.1147215 TRUE
190 Leipziger Straße 50.8906672 50.8906672 100.0000000 50.8942478 TRUE
221 Karl-Marx-Straße 69.6636053 69.6636053 100.0000000 70.9526111 TRUE

The handling of missing pollutant data is unclear. Simply deleting all observations with missing values can result in a significant amount of data being lost. Although for some pollutants there are longer periods where no measurements are available, there are also very short periods of a few hours in the datasets where the data probably could not be collected correctly. Estimating data over a longer period of time is complicated because measurements may vary irregularly over many consecutive hours or days. In such a case, estimation with e.g. linear methods would run the risk of estimating data that would later do more harm than good as training data. However, over very short periods of time, the probability of large fluctuations is greatly reduced. In such a situation, it would be useful to estimate the missing data. Therefore, up to a maximum of 4 consecutive missing observations are estimated using interpolated values. In this way, the proportion of missing observations can be reduced to a certain extent without running too great of a risk of neglecting large fluctuations in the measured values.

# Function to approximate values by interpolation for given columns in a dataframe 
estimate_by_interpolation <- function(df, columns, maximum_gap) {
  for (col in columns) {
    # if a column has no values at all, do not interpolate
    if ((sum(is.na(df[[col]]))/length(df[[col]])) == 1) { 
      break
    }
    
    na_start <- min(which(!is.na(df[[col]]))) # index of first non NA value
    na_end <- max(which(!is.na(df[[col]]))) # index of last non NA value
    
    # approximate values between first and last NA value
    df[[col]][na_start:na_end] <- na.approx(
      df[[col]][na_start:na_end], maxgap = maximum_gap
      )
    
  }
  return(df)
}

# Replace missing values by Interpolation for all pollutants
airweather_by_used_stations <- airweather_by_used_stations %>%
  mutate(messwerte = map(.x = messwerte, estimate_by_interpolation, 
                         c("pm25", "pm10", "O3", "NO2"), 
                         4))

In order to observe the missing traffic data, the first step was to restrict the data to a large extent. For our analysis, only traffic monitoring stations located in the immediate vicinity of used “traffic” type stations were selected. These were identified through a detailed examination. Between 2 and 3 traffic sensors were identified at each of the 4 air quality stations. In total, data from 9 traffic sensors are considered.

The traffic data set does contain some anomalies, such as records showing a speed of -1 km/h, which suggests missing or anomalous data. It was found that a speed reading of -1 km/h was in most cases due to the fact that no vehicles were recorded at a station at a particular time. When looking more closely at the speed indications with -1 km/h, hardly any observations could be found where the number of recorded vehicles was greater than 0. Therefore, it was decided that further consideration is not necessary, as the speed indication is not taken into account in the further analysis.

Calculation of CAQI-Index

The calculation of the CAQI-Index is based on the theory explained in the associated work. The pollutant concentrations are combined into a single index that can be easily understood and compared. Firstly, the sub-index for individual pollutants is calculated, then a combined index is computed. The combined index is the maximum of the individual subindex values, reflecting the pollutant with the highest concentration relative to its own threshold. Once the index is calculated, it is then classified into one of several qualitative categories (e.g., ‘very low’, ‘low’, ‘medium’, ‘high’, ‘very high’), which provide a more intuitive understanding of the air quality.

# Compute the CAQI value for a single pollutant
calculcate_single_aqi <- function(C, C_low, C_high, I_low, I_high) {
  round(((I_high - I_low) / (C_high - C_low)) * (C - C_low) + I_low, 0)
}

# Compute the combined CAQI value
compute_aqi <- function(NO2 = 0, PM10 = 0, O3 = 0, PM25 = 0) {
  if (!is.numeric(NO2)) {
    NO2 <- 0
  }
  if (!is.numeric(PM10)) {
    PM10 <- 0
  }
  if (!is.numeric(O3)) {
    O3 <- 0
  }
  if (!is.numeric(PM25)) {
    PM25 <- 0
  }

  NO2_index <- case_when(
    between(NO2, 0, 49) ~ calculcate_single_aqi(NO2, 0, 49, 0, 24),
    between(NO2, 50, 99) ~ calculcate_single_aqi(NO2, 50, 99, 25, 49),
    between(NO2, 100, 199) ~ calculcate_single_aqi(NO2, 100, 199, 50, 74),
    between(NO2, 200, 400) ~ calculcate_single_aqi(NO2, 200, 400, 75, 100),
    NO2 > 400 ~ 101
  )
  PM10_index <- case_when(
    between(PM10, 0, 24) ~ calculcate_single_aqi(PM10, 0, 24, 0, 24),
    between(PM10, 25, 49) ~ calculcate_single_aqi(PM10, 25, 49, 25, 49),
    between(PM10, 50, 89) ~ calculcate_single_aqi(PM10, 50, 89, 50, 74),
    between(PM10, 90, 180) ~ calculcate_single_aqi(PM10, 90, 180, 75, 100),
    PM10 > 180 ~ 101
  )
  O3_index <- case_when(
    between(O3, 0, 59) ~ calculcate_single_aqi(O3, 0, 49, 0, 24),
    between(O3, 60, 119) ~ calculcate_single_aqi(O3, 50, 99, 25, 49),
    between(O3, 120, 179) ~ calculcate_single_aqi(O3, 100, 199, 50, 74),
    between(O3, 180, 240) ~ calculcate_single_aqi(O3, 200, 399, 75, 100),
    O3 > 240 ~ 101
  )
  PM25_index <- case_when(
    between(PM25, 0, 14) ~ calculcate_single_aqi(PM25, 0, 49, 0, 24),
    between(PM25, 15, 29) ~ calculcate_single_aqi(PM25, 50, 99, 25, 49),
    between(PM25, 30, 54) ~ calculcate_single_aqi(PM25, 100, 199, 50, 74),
    between(PM25, 55, 110) ~ calculcate_single_aqi(PM25, 200, 399, 75, 100),
    PM25 > 110 ~ 101
  )
  return(max(c(NO2_index, PM10_index, O3_index, PM25_index)))
}

# Function to Compute the Qualitative Name of respective CAQI-Index values
get_qualitative_name <- function(caqi_index) {
  case_when(
    caqi_index %in% c(0:24) ~ "very low",
    caqi_index %in% c(25:49) ~ "low",
    caqi_index %in% c(50:74) ~ "medium",
    caqi_index %in% c(75:100) ~ "high",
    caqi_index > 100 ~ "very high"
  )
}


# Compute all CAQI-Index values and qualitative names
plan(multisession, workers = 4)
air_weather_df <- airweather_by_used_stations %>%
  pull(messwerte) %>%
  bind_rows() %>%
  mutate(
    caqi_index = unlist(future_pmap(
      .l = list(
        if_else(is.na(NO2), 0, NO2),
        if_else(is.na(pm10), 0, pm10),
        if_else(is.na(O3), 0, O3),
        if_else(is.na(pm25), 0, pm25)
      ),
      compute_aqi
    )),
    caqi_type = get_qualitative_name(caqi_index),
    .after = NO2
  )

# Grouping air quality monitoring data by station
airweather_by_used_stations <- airweather_by_used_stations %>%
  mutate(messwerte = map(Station, function(x) air_weather_df %>% filter(Station == x)))

Relationships between Pollutants, Weather Variables, and Traffic

Creation of interaction features

By creating interaction features, one can see whether a the effect of one variables on the outcome variable depends on the value of another variable. Thus, ab bit more complex relationship can be observed.
Interaction features between the variables of PM2.5, PM10, O3, NO2, windspeed, temperature and relative humidity were created to examine their relationship with the CAQI-Index. Those are created by multiplying the values of the interacting variables. Then, the correlation between the interaction features and the CAQI index can be computed. By looking at the following correlation values, it can be observed that the interaction between all pollutants (especially O3) and temperature have positive correlations with the CAQI index. All interactions between pollutants (excluding NO2) and relative humidity as well as windspeed also have positive correlations with the CAQI index. This suggests that wind speed, temperature and relative humidity may influence the effect of these pollutants on the CAQI index. These results suggest that the effect of pollutants on CAQI index are much likely to be influenced by weather variables. However, such relationships are complex and a full capturation of all factors explaining these effects could only be explained by more complex algorithms and most probably additional features not captured in our data.

Source: Berlin Open Data - dl-de/by-2-0, Open-Meteo URI: luftdaten.berlin.de, open-meteo.com Own representation
Interaction Feature Correlation with CAQI index
pm25_windspeed_100m 0.3121248
pm25_temperature_2m 0.2924907
pm25_relativehumidity_2m 0.3570096
pm10_windspeed_100m 0.3877188
pm10_temperature_2m 0.3641073
pm10_relativehumidity_2m 0.4494474
O3_windspeed_100m 0.4842067
O3_temperature_2m 0.6830223
O3_relativehumidity_2m 0.4928495
NO2_windspeed_100m 0.0347842
NO2_relativehumidity_2m 0.0889495
NO2_temperature_2m 0.1274054

To conclude, the deep dive into Berlin’s air quality proved to be an analysis revealing patterns in pollution levels, weather and traffic data. Initial challenges around missing pollutant data were tackled by eliminating stations that were significantly deficient in their observations. Other missing values were filled through careful interpolation, keeping the data integrity intact. The established CAQI-Index methodology was utilized to combine different pollutant levels into one representative figure. By breaking down these index values into understandable categories, one could gain a relatable and comparative insight into air quality in the city of Berlin. The temporal analysis showed trends that varied based on station locations, weekdays, weekends, and seasons. Traffic-heavy areas recorded more pollutants, especially NO2, as compared to urban background and suburban stations. Regular workdays reported higher pollutant levels, with the exception of O3, which rose during the weekends. Seasonally, NO2 peaked during the chillier fall and winter, while O3 levels climbed during spring and summer. Similar patterns were mirrored in the CAQI index values, underlining the influence of individual pollutants on overall air quality. The exploratory data analysis also considered the interplay between pollutants, weather conditions and traffic patterns. Elements such as wind speed, temperature, and humidity showed significant correlation with pollution levels. Wind speed particularly stood out for its negative correlation, suggesting a role of diffusing pollutants and possibly enhancing air quality. Traffic was associated with higher NO2 and CAQI index levels, indicating a link between heavier traffic and declining air quality. Crucially, we found that the impact of pollutants on the CAQI index was moderated by weather variables. The interaction between pollutants and factors like temperature, humidity, and wind speed showed positive correlations with the CAQI index, suggesting the interplay of these elements in shaping air quality. In essence, this exploratory data analysis paves the way for creating more complex machine learning models, with the goal of predicing air quality in Berlin.

Session info

R version 4.2.1 (2022-06-23)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.4.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ggmap_3.0.1     furrr_0.3.1     future_1.28.0   corrr_0.4.4    
 [5] zoo_1.8-11      knitr_1.40      lubridate_1.8.0 forcats_0.5.2  
 [9] stringr_1.4.1   dplyr_1.1.2     purrr_0.3.5     readr_2.1.3    
[13] tidyr_1.2.1     tibble_3.2.1    ggplot2_3.4.0   tidyverse_1.3.2

loaded via a namespace (and not attached):
 [1] httr_1.4.4          bit64_4.0.5         vroom_1.6.0        
 [4] jsonlite_1.8.3      modelr_0.1.9        assertthat_0.2.1   
 [7] highr_0.9           sp_1.5-1            googlesheets4_1.0.1
[10] cellranger_1.1.0    yaml_2.3.6          globals_0.16.1     
[13] pillar_1.9.0        backports_1.4.1     lattice_0.20-45    
[16] glue_1.6.2          digest_0.6.30       RColorBrewer_1.1-3 
[19] rvest_1.0.3         colorspace_2.0-3    htmltools_0.5.5    
[22] plyr_1.8.7          pkgconfig_2.0.3     broom_1.0.1        
[25] listenv_0.8.0       haven_2.5.1         scales_1.2.1       
[28] jpeg_0.1-9          tzdb_0.3.0          googledrive_2.0.0  
[31] farver_2.1.1        generics_0.1.3      ellipsis_0.3.2     
[34] withr_2.5.0         cli_3.6.1           magrittr_2.0.3     
[37] crayon_1.5.2        readxl_1.4.1        evaluate_0.17      
[40] fs_1.6.2            fansi_1.0.3         parallelly_1.32.1  
[43] xml2_1.3.3          tools_4.2.1         hms_1.1.2          
[46] RgoogleMaps_1.4.5.3 gargle_1.2.1        lifecycle_1.0.3    
[49] munsell_0.5.0       reprex_2.0.2        compiler_4.2.1     
[52] rlang_1.1.1         grid_4.2.1          rstudioapi_0.14    
[55] htmlwidgets_1.6.2   labeling_0.4.2      bitops_1.0-7       
[58] rmarkdown_2.17      gtable_0.3.1        codetools_0.2-18   
[61] curl_4.3.3          DBI_1.1.3           R6_2.5.1           
[64] gridExtra_2.3       bit_4.0.4           fastmap_1.1.0      
[67] utf8_1.2.2          stringi_1.7.8       Rcpp_1.0.9         
[70] parallel_4.2.1      vctrs_0.6.3         png_0.1-7          
[73] dbplyr_2.2.1        tidyselect_1.2.0    xfun_0.39